AITopics | unstructured text

Collaborating Authors

unstructured text

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Conflict-Aware Knowledge Editing in the Wild: Semantic-Augmented Graph Representation for Unstructured Text

Neural Information Processing SystemsJun-14-2026, 06:41:55 GMT

Large Language Models (LLMs) have demonstrated broad applications but suffer from issues like hallucinations, erroneous outputs and outdated knowledge. Model editing emerges as an effective solution to refine knowledge in LLMs, yet existing methods typically depend on structured knowledge representations. However, real-world knowledge is primarily embedded within complex, unstructured text. Existing structured knowledge editing approaches face significant challenges when handling the entangled and intricate knowledge present in unstructured text, resulting in issues such as representation ambiguity and editing conflicts. To address these challenges, we propose a Conflict-Aware Knowledge Editing in the Wild (CAKE) framework, the first framework explicitly designed for editing knowledge extracted from wild unstructured text. CAKE comprises two core components: a Semantic-augmented Graph Representation module and a Conflict-aware Knowledge Editing strategy. The Semantic-augmented Graph Representation module enhances knowledge encoding through structural disambiguation, relational enrichment, and semantic diversification. Meanwhile, the Conflict-aware Knowledge Editing strategy utilizes a graph-theoretic coloring algorithm to disentangle conflicted edits by allocating them to orthogonal parameter subspaces, thereby effectively mitigating editing conflicts. Experimental results on the AKEW benchmark demonstrate that CAKE significantly outperforms existing methods, achieving a 15.43\% improvement in accuracy on llama3 editing tasks.

artificial intelligence, large language model, natural language, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A systematic review of trial-matching pipelines using large language models

Morrison, Braxton A., Sushil, Madhumita, Young, Jacob S.

arXiv.org Artificial IntelligenceSep-25-2025

Matching patients to clinical trial options is critical for identifying novel treatments, especially in oncology. However, manual matching is labor-intensive and error-prone, leading to recruitment delays. Pipelines incorporating large language models (LLMs) offer a promising solution. We conducted a systematic review of studies published between 2020 and 2025 from three academic databases and one preprint server, identifying LLM-based approaches to clinical trial matching. Of 126 unique articles, 31 met inclusion criteria. Reviewed studies focused on matching patient-to-criterion only (n=4), patient-to-trial only (n=10), trial-to-patient only (n=2), binary eligibility classification only (n=1) or combined tasks (n=14). Sixteen used synthetic data; fourteen used real patient data; one used both. Variability in datasets and evaluation metrics limited cross-study comparability. In studies with direct comparisons, the GPT-4 model consistently outperformed other models, even finely-tuned ones, in matching and eligibility extraction, albeit at higher cost. Promising strategies included zero-shot prompting with proprietary LLMs like the GPT-4o model, advanced retrieval methods, and fine-tuning smaller, open-source models for data privacy when incorporation of large models into hospital infrastructure is infeasible. Key challenges include accessing sufficiently large real-world data sets, and deployment-associated challenges such as reducing cost, mitigating risk of hallucinations, data leakage, and bias. This review synthesizes progress in applying LLMs to clinical trial matching, highlighting promising directions and key limitations. Standardized metrics, more realistic test sets, and attention to cost-efficiency and fairness will be critical for broader deployment.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.19327

Country: North America > United States > California > San Francisco County > San Francisco (0.29)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Modular Unsupervised Framework for Attribute Recognition from Unstructured Text

Solaiman, KMA

arXiv.org Artificial IntelligenceJul-8-2025

--We propose POSID, a modular, lightweight and on-demand framework for extracting structured attribute-based properties from unstructured text without task-specific fine-tuning. While the method is designed to be adaptable across domains, in this work, we evaluate it on human attribute recognition in incident reports. POSID combines lexical and semantic similarity techniques to identify relevant sentences and extract attributes. We demonstrate its effectiveness on a missing person use-case using the InciT ext dataset, achieving effective attribute extraction without supervised training. Attribute recognition from unstructured text is important in many domains, including human descriptions in incident reports, product descriptions, and more.

artificial intelligence, natural language, recognition, (15 more...)

arXiv.org Artificial Intelligence

2507.03949

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Enhancements for Developing a Comprehensive AI Fairness Assessment Standard

Agarwal, Avinash, Kumar, Mayashankar, Nene, Manisha J.

arXiv.org Artificial IntelligenceApr-11-2025

Abstract--As AI systems increasingly influence critical sectors like telecommunications, finance, healthcare, and pub lic services, ensuring fairness in decision-making is essenti al to prevent biased or unjust outcomes that disproportionately affect vulnerable entities or result in adverse impacts. This need is particularly pressing as the industry approaches the 6G era, where AI will drive complex functions like autonomous netwo rk management and hyper-personalized services. However, as AI applications diversify, this standard requires enhanceme nt to strengthen its impact and broaden its applicability. This p aper proposes an expansion of the TEC Standard to include fairnes s assessments for images, unstructured text, and generative AI, including large language models, ensuring a more comprehen - sive approach that keeps pace with evolving AI technologies . By incorporating these dimensions, the enhanced framework will promote responsible and trustworthy AI deployment acr oss various sectors. The widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) technologies has driven transforma-tive advancements across critical sectors, including tele communications, healthcare, finance, and public services.

assessment, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/COMSNETS63942.2025.10885551

2504.07516

Country: Asia > India > NCT (0.14)

Genre: Research Report (0.40)

Industry:

Information Technology (0.69)
Health & Medicine (0.55)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Think Inside the JSON: Reinforcement Strategy for Strict LLM Schema Adherence

Agarwal, Bhavik, Joshi, Ishan, Rojkova, Viktoria

arXiv.org Artificial IntelligenceFeb-18-2025

In this paper, we address the challenge of enforcing strict schema adherence in large language model (LLM) generation by leveraging LLM reasoning capabilities. Building on the DeepSeek R1 reinforcement learning framework, our approach trains structured reasoning skills of a 1.5B parameter model through a novel pipeline that combines synthetic reasoning dataset construction with custom reward functions under Group Relative Policy Optimization (GRPO). Specifically, we first perform R1 reinforcement learning on a 20K sample unstructured-to-structured dataset, mirroring the original DeepSeek R1 methods, to establish core reasoning abilities. Subsequently, we performed supervised fine-tuning on a separate 10K reasoning sample dataset, focusing on refining schema adherence for downstream tasks. Despite the relatively modest training scope, requiring approximately 20 hours on an 8xH100 GPU cluster for GRPO training and 3 hours on 1xA100 for SFT, our model demonstrates robust performance in enforcing schema consistency. We compare our ThinkJSON approach against the original DeepSeek R1 (671B), distilled versions of DeepSeek R1 (Qwen-1.5B and Qwen-7B), and Gemini 2.0 Flash (70B), showcasing its effectiveness in real-world applications. Our results underscore the practical utility of a resource-efficient framework for schema-constrained text generation.

json, schema, schema adherence, (13 more...)

arXiv.org Artificial Intelligence

2502.14905

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

Zimmermann, Yoel, Bazgir, Adib, Afzal, Zartashia, Agbere, Fariha, Ai, Qianxiang, Alampara, Nawaf, Al-Feghali, Alexander, Ansari, Mehrad, Antypov, Dmytro, Aswad, Amro, Bai, Jiaru, Baibakova, Viktoriia, Biswajeet, Devi Dutta, Bitzek, Erik, Bocarsly, Joshua D., Borisova, Anna, Bran, Andres M, Brinson, L. Catherine, Calderon, Marcel Moran, Canalicchio, Alessandro, Chen, Victor, Chiang, Yuan, Circi, Defne, Charmes, Benjamin, Chaudhary, Vikrant, Chen, Zizhang, Chiu, Min-Hsueh, Clymo, Judith, Dabhadkar, Kedar, Daelman, Nathan, Datar, Archit, de Jong, Wibe A., Evans, Matthew L., Fard, Maryam Ghazizade, Fisicaro, Giuseppe, Gangan, Abhijeet Sadashiv, George, Janine, Gonzalez, Jose D. Cojal, Götte, Michael, Gupta, Ankur K., Harb, Hassan, Hong, Pengyu, Ibrahim, Abdelrahman, Ilyas, Ahmed, Imran, Alishba, Ishimwe, Kevin, Issa, Ramsey, Jablonka, Kevin Maik, Jones, Colin, Josephson, Tyler R., Juhasz, Greg, Kapoor, Sarthak, Kang, Rongda, Khalighinejad, Ghazal, Khan, Sartaaj, Klawohn, Sascha, Kuman, Suneel, Ladines, Alvin Noe, Leang, Sarom, Lederbauer, Magdalena, Sheng-Lun, null, Liao, null, Liu, Hao, Liu, Xuefeng, Lo, Stanley, Madireddy, Sandeep, Maharana, Piyush Ranjan, Maheshwari, Shagun, Mahjoubi, Soroush, Márquez, José A., Mills, Rob, Mohanty, Trupti, Mohr, Bernadette, Moosavi, Seyed Mohamad, Moßhammer, Alexander, Naghdi, Amirhossein D., Naik, Aakash, Narykov, Oleksandr, Näsström, Hampus, Nguyen, Xuan Vu, Ni, Xinyi, O'Connor, Dana, Olayiwola, Teslim, Ottomano, Federico, Ozhan, Aleyna Beste, Pagel, Sebastian, Parida, Chiku, Park, Jaehee, Patel, Vraj, Patyukova, Elena, Petersen, Martin Hoffmann, Pinto, Luis, Pizarro, José M., Plessers, Dieter, Pradhan, Tapashree, Pratiush, Utkarsh, Puli, Charishma, Qin, Andrew, Rajabi, Mahyar, Ricci, Francesco, Risch, Elliot, Ríos-García, Martiño, Roy, Aritra, Rug, Tehseen, Sayeed, Hasan M, Scheidgen, Markus, Schilling-Wilhelmi, Mara, Schloz, Marcel, Schöppach, Fabian, Schumann, Julia, Schwaller, Philippe, Schwarting, Marcus, Sharlin, Samiha, Shen, Kevin, Shi, Jiale, Si, Pradip, D'Souza, Jennifer, Sparks, Taylor, Sudhakar, Suraj, Talirz, Leopold, Tang, Dandan, Taran, Olga, Terboven, Carla, Tropin, Mark, Tsymbal, Anastasiia, Ueltzen, Katharina, Unzueta, Pablo Andres, Vasan, Archit, Vinchurkar, Tirtha, Vo, Trung, Vogel, Gabriel, Völker, Christoph, Weinreich, Jan, Yang, Faradawn, Zaki, Mohd, Zhang, Chi, Zhang, Sylvester, Zhang, Weijie, Zhu, Ruijie, Zhu, Shang, Janssen, Jan, Li, Calvin, Foster, Ian, Blaiszik, Ben

arXiv.org Artificial IntelligenceJan-2-2025

Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions. The submissions spanned seven key application areas and demonstrated the diverse utility of LLMs for applications in (1) molecular and material property prediction; (2) molecular and material design; (3) automation and novel interfaces; (4) scientific communication and education; (5) research data management and automation; (6) hypothesis generation and evaluation; and (7) knowledge extraction and reasoning from scientific literature. Each team submission is presented in a summary table with links to the code and as brief papers in the appendix. Beyond team results, we discuss the hackathon event and its hybrid format, which included physical hubs in Toronto, Montreal, San Francisco, Berlin, Lausanne, and Tokyo, alongside a global online hub to enable local and virtual collaboration. Overall, the event highlighted significant improvements in LLM capabilities since the previous year's hackathon, suggesting continued expansion of LLMs for applications in materials science and chemistry research. These outcomes demonstrate the dual utility of LLMs as both multipurpose models for diverse machine learning tasks and platforms for rapid prototyping custom applications in scientific research.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.15221

Country:

North America > Canada > Ontario > Toronto (0.34)
North America > Canada > Quebec > Montreal (0.34)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.24)
(2 more...)

Genre:

Workflow (1.00)
Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
(3 more...)

Industry:

Materials > Construction Materials (1.00)
Information Technology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(11 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

UnKE: Unstructured Knowledge Editing in Large Language Models

Deng, Jingcheng, Wei, Zihao, Pang, Liang, Ding, Hanxing, Shen, Huawei, Cheng, Xueqi

arXiv.org Artificial IntelligenceMay-24-2024

Recent knowledge editing methods have primarily focused on modifying structured knowledge in large language models, heavily relying on the assumption that structured knowledge is stored as key-value pairs locally in MLP layers or specific neurons. However, this task setting overlooks the fact that a significant portion of real-world knowledge is stored in an unstructured format, characterized by long-form content, noise, and a complex yet comprehensive nature. The "knowledge locating" and "term-driven optimization" techniques conducted from the assumption used in previous methods (e.g., MEMIT) are ill-suited for unstructured knowledge. To address these challenges, we propose a novel unstructured knowledge editing method, namely UnKE, which extends previous assumptions in the layer dimension and token dimension. Firstly, in the layer dimension, we discard the "knowledge locating" step and treat first few layers as the key, which expand knowledge storage through layers to break the "knowledge stored locally" assumption. Next, we replace "term-driven optimization" with "cause-driven optimization" across all inputted tokens in the token dimension, directly optimizing the last layer of the key generator to perform editing to generate the required key vectors. By utilizing key-value pairs at the layer level, UnKE effectively represents and edits complex and comprehensive unstructured knowledge, leveraging the potential of both the MLP and attention layers. Results on newly proposed unstructure knowledge editing dataset (UnKEBench) and traditional structured datasets demonstrate that UnKE achieves remarkable performance, surpassing strong baselines.

editing, knowledge, knowledge editing, (15 more...)

arXiv.org Artificial Intelligence

2405.15349

Country:

Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Government (1.00)
Law (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Clinical Reasoning over Tabular Data and Text with Bayesian Networks

Rabaey, Paloma, Deleu, Johannes, Heytens, Stefan, Demeester, Thomas

arXiv.org Artificial IntelligenceMay-23-2024

Bayesian networks are well-suited for clinical reasoning on tabular data, but are less compatible with natural language data, for which neural networks provide a successful framework. This paper compares and discusses strategies to augment Bayesian networks with neural text representations, both in a generative and discriminative manner. This is illustrated with simulation results for a primary care use case (diagnosis of pneumonia) and discussed in a broader clinical context.

bayesian network, reasoning, symptom, (15 more...)

arXiv.org Artificial Intelligence

2403.09481

Country: Europe > Belgium > Flanders > East Flanders > Ghent (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.89)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Automatic Knowledge Graph Construction for Judicial Cases

Zhou, Jie, Chen, Xin, Zhang, Hang, Li, Zhe

arXiv.org Artificial IntelligenceApr-14-2024

In this paper, we explore the application of cognitive intelligence in legal knowledge, focusing on the development of judicial artificial intelligence. Utilizing natural language processing (NLP) as the core technology, we propose a method for the automatic construction of case knowledge graphs for judicial cases. Our approach centers on two fundamental NLP tasks: entity recognition and relationship extraction. We compare two pre-trained models for entity recognition to establish their efficacy. Additionally, we introduce a multi-task semantic relationship extraction model that incorporates translational embedding, leading to a nuanced contextualized case knowledge representation. Specifically, in a case study involving a "Motor Vehicle Traffic Accident Liability Dispute," our approach significantly outperforms the baseline model. The entity recognition F1 score improved by 0.36, while the relationship extraction F1 score increased by 2.37. Building on these results, we detail the automatic construction process of case knowledge graphs for judicial cases, enabling the assembly of knowledge graphs for hundreds of thousands of judgments. This framework provides robust semantic support for applications of judicial AI, including the precise categorization and recommendation of related cases.

knowledge graph, relation, vector, (16 more...)

arXiv.org Artificial Intelligence

2404.09416

Country:

Asia > China > Zhejiang Province (0.04)
Asia > China > Sichuan Province (0.04)
Asia > China > Jiangsu Province (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation

Zhu, Hongyin, Peng, Hao, Lyu, Zhiheng, Hou, Lei, Li, Juanzi, Xiao, Jinghui

arXiv.org Artificial IntelligenceMar-21-2024

Existing technologies expand BERT from different perspectives, e.g. designing different pre-training tasks, different semantic granularities, and different model architectures. Few models consider expanding BERT from different text formats. In this paper, we propose a heterogeneous knowledge language model (\textbf{HKLM}), a unified pre-trained language model (PLM) for all forms of text, including unstructured text, semi-structured text, and well-structured text. To capture the corresponding relations among these multi-format knowledge, our approach uses masked language model objective to learn word knowledge, uses triple classification objective and title matching objective to learn entity knowledge and topic knowledge respectively. To obtain the aforementioned multi-format text, we construct a corpus in the tourism domain and conduct experiments on 5 tourism NLP datasets. The results show that our approach outperforms the pre-training of plain text using only 1/4 of the data. We further pre-train the domain-agnostic HKLM and achieve performance gains on the XNLI dataset.

dataset, knowledge triple, model incorporating domain-specific heterogeneous knowledge, (11 more...)

arXiv.org Artificial Intelligence

2109.01048

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Yunnan Province > Kunming (0.04)
Asia > China > Guangdong Province (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Consumer Products & Services > Travel (0.71)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback